Incorporating User Input with Topic Modeling

نویسندگان

  • Yi Yang
  • Shimei Pan
  • Jie Lu
  • Mercan Topkara
  • Doug Downey
چکیده

Topic models such as Latent Dirichlet Allocation (LDA) can discover topics from a large collection of documents in an unsupervised fashion and thus is one of the most popular text analysis tool currently in use. However, when using it in practice, the topics discovered by topic model don’t always make sense to end users. The poor quality topics will substantially undermine a topic model system’s usability. Due to the unsupervised nature of topic model, this is difficult to incorporate user’s domain knowledge or feedback to the topic model. In this paper, we introduce a novel constrained LDA model, named cLDA, that is capable of incorporating user inputs in the form of document pairwise constraints. Document pairwise constraints can be document must-links and document cannot-links which represent the semantic similarity of documents. The effectiveness of the proposed cLDA model is shown in several aspects on a benchmark dataset.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Topic Model Stability for Effective Document Exploration

Topic modeling has become a ubiquitous topic analysis tool for text exploration. Most of the existing works on topic modeling focus on fitting topic models to input data. They however ignore an important usability issue that is closely related to the end user experience: stability. In this study, we investigate the stability problem in topic modeling. We first report on the experiments conducte...

متن کامل

The human touch: How non-expert users perceive, interpret, and fix topic models

= {Topic modeling is a common tool for understanding large bodies of text, but is typically provided as a take it or leave it proposition. Incorporating human knowledge in unsupervised learning is a promising approach to create high-quality topic models. Existing interactive systems and modeling algorithms support a wide range of refinement operations to express feedback. However, these systems...

متن کامل

Delivering Content : The Right Content At the Right Time and at the Right Place to the Right Person

The number of User Generated Content providers and media contenders in Indonesia has been growing fast. Consequently, Keepo.me, as one of Indonesian User Generated Content providers, needs to overcome several challenges in order to, at least, hold their base in the high-risk competition. One of this challenges is to improve the User Engagement. Therefore, this master thesis has been carried out...

متن کامل

Topic Modeling using Topics from Many Domains, Lifelong Learning and Big Data

Topic modeling has been commonly used to discover topics from document collections. However, unsupervised models can generate many incoherent topics. To address this problem, several knowledge-based topic models have been proposed to incorporate prior domain knowledge from the user. This work advances this research much further and shows that without any user input, we can mine the prior knowle...

متن کامل

Active Learning with Constrained Topic Model

Latent Dirichlet Allocation (LDA) is a topic modeling tool that automatically discovers topics from a large collection of documents. It is one of the most popular text analysis tools currently in use. In practice however, the topics discovered by LDA do not always make sense to end users. In this extended abstract, we propose an active learning framework that interactively and iteratively acqui...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014